19 research outputs found
Sparsity in Dynamics of Spontaneous Subtle Emotions: Analysis \& Application
Spontaneous subtle emotions are expressed through micro-expressions, which
are tiny, sudden and short-lived dynamics of facial muscles; thus poses a great
challenge for visual recognition. The abrupt but significant dynamics for the
recognition task are temporally sparse while the rest, irrelevant dynamics, are
temporally redundant. In this work, we analyze and enforce sparsity constrains
to learn significant temporal and spectral structures while eliminate
irrelevant facial dynamics of micro-expressions, which would ease the challenge
in the visual recognition of spontaneous subtle emotions. The hypothesis is
confirmed through experimental results of automatic spontaneous subtle emotion
recognition with several sparsity levels on CASME II and SMIC, the only two
publicly available spontaneous subtle emotion databases. The overall
performances of the automatic subtle emotion recognition are boosted when only
significant dynamics are preserved from the original sequences.Comment: IEEE Transaction of Affective Computing (2016
Digital system for bio-inspired visual attention processing fast and efficient information theoretic modelling of saliency
Visual attention is a biological mechanism of human vision systems to cope with rich and fast-changing visual information in surrounding environments. Visual saliency is a strategy, which recommends attentive spots to be visited in descending orders of interest or information amounts. This thesis aims to utilize information theory in computational saliency models, assumed that more attention is drawn toward more informative locations.
As visual media, i.e. images and videos, are high-dimensional data, information estimation is often computationally infeasible due to enormous requirement of computation and data samples. This thesis proposes and analyses three different practical and innovative information-based saliency models.
The first model, called entropy-based saliency method (ENT), measures salient information with centre-surrounding operation by conditional entropy (ENT-CON) or Kullback-Leibler diver-gence (ENT-KLD). However, ENT only estimates information from local features offixed-size windows, it does not utilize multi-scale and global information of visual media, which are proven to be important in biological visual attention.
To utilise multi-scale information, Wavelet-based Scale-Saliency (WSS), the second model, estimates information from power distribution of data across wavelet sub-bands basis descriptors in multiple dyadic scales. Though WSS has benefited from local features at multiple scales, it has not integrated information of global context or statistical characteristics of natural images.
Multiscale Discriminant Saliency (MDIS), the third model, adopts Wavelet Hidden Markov Tree (WHMT) to unify both multiple-scale and global information for a comprehensive saliency method. All three models, ENT, WSS and MDIS are evaluated and compared against well-known saliency methods such as PSS, AIM, DIS, etc quantitatively by standard numerical tools (Normalized Scale Saliency (NSS), Linear Correlation Coefficient (LCC), Area Under Curver (AUC)) on N.Bruce’s, Kootstra’s and Judd’s databases with human eye-tracking ground-truth as well as qualitatively by visual examination of individual cases. Performances and comprehen-siveness of three models are reflected through numerical results of an experiment on Bruce’s database. As the latter model is designed in more comprehensive and computationally complex manner than the previous, all three quantitative evaluations (LCC,NSS,AUC) generally and computational time increase in that order.
ENT WSS MDIS
LCC 0.02263 -0.01731 0.02382
NSS -0.17533 0.31782 0.48019
AUC 0.78167 0.70292 0.88335
TIME(s/frame) 0.87040 1.26889 2.32734
Table 1: ENT,WSS,MDIS’s quantitative results on N.Bruce’s databas
Multi-scale Discriminant Saliency with Wavelet-based Hidden Markov Tree Modelling
The bottom-up saliency, an early stage of humans' visual attention, can be
considered as a binary classification problem between centre and surround
classes. Discriminant power of features for the classification is measured as
mutual information between distributions of image features and corresponding
classes . As the estimated discrepancy very much depends on considered scale
level, multi-scale structure and discriminant power are integrated by employing
discrete wavelet features and Hidden Markov Tree (HMT). With wavelet
coefficients and Hidden Markov Tree parameters, quad-tree like label structures
are constructed and utilized in maximum a posterior probability (MAP) of hidden
class variables at corresponding dyadic sub-squares. Then, a saliency value for
each square block at each scale level is computed with discriminant power
principle. Finally, across multiple scales is integrated the final saliency map
by an information maximization rule. Both standard quantitative tools such as
NSS, LCC, AUC and qualitative assessments are used for evaluating the proposed
multi-scale discriminant saliency (MDIS) method against the well-know
information based approach AIM on its released image collection with
eye-tracking data. Simulation results are presented and analysed to verify the
validity of MDIS as well as point out its limitation for further research
direction.Comment: arXiv admin note: substantial text overlap with arXiv:1301.396
Spontaneous Subtle Expression Detection and Recognition based on Facial Strain
Optical strain is an extension of optical flow that is capable of quantifying
subtle changes on faces and representing the minute facial motion intensities
at the pixel level. This is computationally essential for the relatively new
field of spontaneous micro-expression, where subtle expressions can be
technically challenging to pinpoint. In this paper, we present a novel method
for detecting and recognizing micro-expressions by utilizing facial optical
strain magnitudes to construct optical strain features and optical strain
weighted features. The two sets of features are then concatenated to form the
resultant feature histogram. Experiments were performed on the CASME II and
SMIC databases. We demonstrate on both databases, the usefulness of optical
strain information and more importantly, that our best approaches are able to
outperform the original baseline results for both detection and recognition
tasks. A comparison of the proposed method with other existing spatio-temporal
feature extraction approaches is also presented.Comment: 21 pages (including references), single column format, accepted to
Signal Processing: Image Communication journa
Digital system for bio-inspired visual attention processing fast and efficient information theoretic modelling of saliency
Visual attention is a biological mechanism of human vision systems to cope with rich and fast-changing visual information in surrounding environments. Visual saliency is a strategy, which recommends attentive spots to be visited in descending orders of interest or information amounts. This thesis aims to utilize information theory in computational saliency models, assumed that more attention is drawn toward more informative locations.
As visual media, i.e. images and videos, are high-dimensional data, information estimation is often computationally infeasible due to enormous requirement of computation and data samples. This thesis proposes and analyses three different practical and innovative information-based saliency models.
The first model, called entropy-based saliency method (ENT), measures salient information with centre-surrounding operation by conditional entropy (ENT-CON) or Kullback-Leibler diver-gence (ENT-KLD). However, ENT only estimates information from local features offixed-size windows, it does not utilize multi-scale and global information of visual media, which are proven to be important in biological visual attention.
To utilise multi-scale information, Wavelet-based Scale-Saliency (WSS), the second model, estimates information from power distribution of data across wavelet sub-bands basis descriptors in multiple dyadic scales. Though WSS has benefited from local features at multiple scales, it has not integrated information of global context or statistical characteristics of natural images.
Multiscale Discriminant Saliency (MDIS), the third model, adopts Wavelet Hidden Markov Tree (WHMT) to unify both multiple-scale and global information for a comprehensive saliency method. All three models, ENT, WSS and MDIS are evaluated and compared against well-known saliency methods such as PSS, AIM, DIS, etc quantitatively by standard numerical tools (Normalized Scale Saliency (NSS), Linear Correlation Coefficient (LCC), Area Under Curver (AUC)) on N.Bruce’s, Kootstra’s and Judd’s databases with human eye-tracking ground-truth as well as qualitatively by visual examination of individual cases. Performances and comprehen-siveness of three models are reflected through numerical results of an experiment on Bruce’s database. As the latter model is designed in more comprehensive and computationally complex manner than the previous, all three quantitative evaluations (LCC,NSS,AUC) generally and computational time increase in that order.
ENT WSS MDIS
LCC 0.02263 -0.01731 0.02382
NSS -0.17533 0.31782 0.48019
AUC 0.78167 0.70292 0.88335
TIME(s/frame) 0.87040 1.26889 2.32734
Table 1: ENT,WSS,MDIS’s quantitative results on N.Bruce’s databas
Micro-expression motion magnification:Global lagrangian vs. local eulerian approaches
Micro-expressions are difficult to spot but are
utterly important for engaging in a conversation or negotiation. Through motion magnification, these expressions become
much more distinguishable and easily recognized. This work
proposes Global Lagrangian Motion Magnification (GLMM)
for consistent exaggeration of facial expressions and dynamics
across a whole video. As the proposal takes an opposite
approach to a previous pivotal work, i.e. local Amplitudebased Eulerian Motion Magnification (AEMM), GLMM and
AEMM are theoretically analyzed for potential advantages and
disadvantages, especially with respect to how magnified noise
and distortions are dealt with. Then, both GLMM and AEMM
are empirically evaluated and compared using the CASME II
micro-expression corpu